Search CORE

295 research outputs found

Linearized analysis versus optimization-based nonlinear analysis for nonlinear systems

Author: Packard Andrew
Topcu Ufuk
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2009
Field of study

For autonomous nonlinear systems stability and input-output properties in small enough (infinitesimally small) neighborhoods of (linearly) asymptotically stable equilibrium points can be inferred from the properties of the linearized dynamics. On the other hand, generalizations of the S-procedure and sum-of-squares programming promise a framework potentially capable of generating certificates valid over quantifiable, finite size neighborhoods of the equilibrium points. However, this procedure involves multiple relaxations (unidirectional implications). Therefore, it is not obvious if the sum-of-squares programming based nonlinear analysis can return a feasible answer whenever linearization based analysis does. Here, we prove that, for a restricted but practically useful class of systems, conditions in sum-of-squares programming based region-of-attraction, reachability, and input-output gain analyses are feasible whenever linearization based analysis is conclusive. Besides the theoretical interest, such results may lead to computationally less demanding, potentially more conservative nonlinear (compared to direct use of sum-of-squares formulations) analysis tools

Crossref

Caltech Authors

Probably Approximately Correct MDP Learning and Control With Temporal Logic Constraints

Author: Fu Jie
Topcu Ufuk
Publication venue
Publication date: 01/01/2014
Field of study

We consider synthesis of control policies that maximize the probability of satisfying given temporal logic specifications in unknown, stochastic environments. We model the interaction between the system and its environment as a Markov decision process (MDP) with initially unknown transition probabilities. The solution we develop builds on the so-called model-based probably approximately correct Markov decision process (PAC-MDP) methodology. The algorithm attains an

\varepsilon

-approximately optimal policy with probability

1-\delta

using samples (i.e. observations), time and space that grow polynomially with the size of the MDP, the size of the automaton expressing the temporal logic specification,

\frac{1}{\varepsilon}

\frac{1}{\delta}

and a finite time horizon. In this approach, the system maintains a model of the initially unknown MDP, and constructs a product MDP based on its learned model and the specification automaton that expresses the temporal logic constraints. During execution, the policy is iteratively updated using observation of the transitions taken by the system. The iteration terminates in finitely many steps. With high probability, the resulting policy is such that, for any state, the difference between the probability of satisfying the specification under this policy and the optimal one is within a predefined bound.Comment: 9 pages, 5 figures, Accepted by 2014 Robotics: Science and Systems (RSS

arXiv.org e-Print Archive

CiteSeerX

Deception in Optimal Control

Author: Ornik Melkior
Topcu Ufuk
Publication venue
Publication date: 08/05/2018
Field of study

In this paper, we consider an adversarial scenario where one agent seeks to achieve an objective and its adversary seeks to learn the agent's intentions and prevent the agent from achieving its objective. The agent has an incentive to try to deceive the adversary about its intentions, while at the same time working to achieve its objective. The primary contribution of this paper is to introduce a mathematically rigorous framework for the notion of deception within the context of optimal control. The central notion introduced in the paper is that of a belief-induced reward: a reward dependent not only on the agent's state and action, but also adversary's beliefs. Design of an optimal deceptive strategy then becomes a question of optimal control design on the product of the agent's state space and the adversary's belief space. The proposed framework allows for deception to be defined in an arbitrary control system endowed with a reward function, as well as with additional specifications limiting the agent's control policy. In addition to defining deception, we discuss design of optimally deceptive strategies under uncertainties in agent's knowledge about the adversary's learning process. In the latter part of the paper, we focus on a setting where the agent's behavior is governed by a Markov decision process, and show that the design of optimally deceptive strategies under lack of knowledge about the adversary naturally reduces to previously discussed problems in control design on partially observable or uncertain Markov decision processes. Finally, we present two examples of deceptive strategies: a "cops and robbers" scenario and an example where an agent may use camouflage while moving. We show that optimally deceptive strategies in such examples follow the intuitive idea of how to deceive an adversary in the above settings

arXiv.org e-Print Archive

Crossref